MRI radiomics based on deep learning automated segmentation to predict early recurrence of hepatocellular carcinoma

Objectives To investigate the utility of deep learning (DL) automated segmentation-based MRI radiomic features and clinical-radiological characteristics in predicting early recurrence after curative resection of single hepatocellular carcinoma (HCC). Methods This single-center, retrospective study included consecutive patients with surgically proven HCC who underwent contrast-enhanced MRI before curative hepatectomy from December 2009 to December 2021. Using 3D U-net-based DL algorithms, automated segmentation of the liver and HCC was performed on six MRI sequences. Radiomic features were extracted from the tumor, tumor border extensions (5 mm, 10 mm, and 20 mm), and the liver. A hybrid model incorporating the optimal radiomic signature and preoperative clinical-radiological characteristics was constructed via Cox regression analyses for early recurrence. Model discrimination was characterized with C-index and time-dependent area under the receiver operating curve (tdAUC) and compared with the widely-adopted BCLC and CNLC staging systems. Results Four hundred and thirty-four patients (median age, 52.0 years; 376 men) were included. Among all radiomic signatures, HCC with 5 mm tumor border extension and liver showed the optimal predictive performance (training set C-index, 0.696). By incorporating this radiomic signature, rim arterial phase hyperenhancement (APHE), and incomplete tumor “capsule,” a hybrid model demonstrated a validation set C-index of 0.706 and superior 2-year tdAUC (0.743) than both the BCLC (0.550; p < 0.001) and CNLC (0.635; p = 0.032) systems. This model stratified patients into two prognostically distinct risk strata (both datasets p < 0.001). Conclusion A preoperative imaging model incorporating the DL automated segmentation-based radiomic signature with rim APHE and incomplete tumor “capsule” accurately predicted early postsurgical recurrence of a single HCC. Critical relevance statement The DL automated segmentation-based MRI radiomic model with rim APHE and incomplete tumor “capsule” hold the potential to facilitate individualized risk estimation of postsurgical early recurrence in a single HCC. Key Points A hybrid model integrating MRI radiomic signature was constructed for early recurrence prediction of HCC. The hybrid model demonstrated superior 2-year AUC than the BCLC and CNLC systems. The model categorized the low-risk HCC group carried longer RFS. Graphical Abstract


Supplementary Material 1 MRI Technique
Magnetic resonance imaging (MRI) with hepatobiliary contrast agent (HCA) was conducted using four 3.0-T systems (GE SIGNA™ Architect; GE SIGNA™ Premier; GE Discovery MR 750; Siemens MAGNETOM Skyra) and one 1.5-T system (uMR588).Additionally, MRI with extracellular contrast agent (ECA) was performed with five 3.0-T systems (Siemens MAGNETOM Skyra; Siemens TrioTim; GE SIGNA™ Architect; GE Discovery MR 750; Philips Ingenia Elition X) and two 1.5-T systems (Siemens Avanto; uMR588).The liver MRI protocols consisted of T2weighted imaging, diffusion-weighted imaging (b values: 0-1200 s/mm 2 ) with apparent diffusion coefficient (ADC) maps, T1-weighted in-and opposed-phase imaging, and dynamic T1-weighted imaging before and after injection of contrast agent in the late arterial phase, portal venous phase (60 s), delayed phase (ECA MRI; 180s) or transitional phase (HCA MRI; 180 s), and hepatobiliary phase (HCA MRI; 20 minutes).The arterial phase images were obtained either by the acquisition triggered 7 s after arrival of the contrast bolus in the celiac trunk or a multiple arterial phase (MAP) imaging technique.In specific, the MAP images were obtained with an 18 s breath hold 20 s after the contrast agent injection, and further reconstructed with a temporal resolution of 3 s.For HCA MRI, gadoxetate disodium (Primovist®; Bayer Schering Pharma AG) was administered intravenously at 1.0-2.0ml/s (0.025 mmol/kg of body weight), followed immediately by a 20-30 ml saline flush.For ECA MRI, gadopentetate dimeglumine (Magnevist®; Bayer Schering Pharma AG) or gadoterate meglumine (Dotarem®; Guerbet) or gadobenate dimeglumine (MultiHance®; Bracco) was intravenously administered at 2.5 ml/s (0.1 mmol/kg of body weight).MRI sequences and parameters are detailed in Table S1.

Dataset for the Development of Automated Segmentation Models
To develop automated deep learning (DL) segmentation models, a total of 1889 patients with focal liver lesions (i.e., hepatocellular carcinoma, hemangioma, and hepatic cyst) from six tertiary hospital in China between December 2013 and February 2021 were included.Patients were allocated into the training set (n=1511), validation set (n=189), and test set (n=189) at a ratio of 8:1:1.Magnetic resonance (MR) images in DICOM format were exported from the picture archiving and communication system.Two abdominal radiologists, both with 5 years of experience in liver MRI, performed manual segmentation of FLLs on T2-weighted imaging, diffusion-weighted imaging (b value of 800 s/mm 2 ), in and opposed phase imaging, pre-and post-contrast enhanced T1-weighted imaging during late arterial phase, portal venous phase, delayed phase (MRI with extracellular contrast agent), transitional phase and hepatobiliary phase (MRI with hepatobiliary contrast agent), avoiding intrahepatic vasculatures.Each radiologist segmented 944 and 945 patients, respectively.To facilitate quality control for manual segmentation, all regions of interest (ROIs) were inspected by a senior radiologist with 30 years of experience in liver MRI.In cases where segmentations were deemed unqualified, manual adjustments were carried out by two junior radiologists.The resulting sketched images were used as input data for training the automated segmentation models.

Automated Segmentation Model Training, Validation and Test
The automated segmentation models were trained using a sequential modular approach.Initially, a three-dimensional convolutional neural network (3D-CNN) model [1] was employed to generate a liver segmentation mask.During this phase, the algorithm defined the liver region on MR images, isolating it from adjacent abdominal organs to enable a focused analysis.This algorithm involved an encoder-decoder architecture with 3D convolutions and pooling layers, complemented by the Rectified Linear Unit (RLU) activation and batch normalization.Skip connections were built between corresponding layers of the encoder and decoder.The output layer included Insights Imaging (2024) Wei H, Zheng T, Zhang X, et al.Lesion segmentation was achieved using a 3D-UNet framework, characterized by an encoder-decoder architecture with 3D convolutions and pooling layers.This framework was incorporated Rectified Linear Unit (RLU) activation and batch normalization.The encoder and decoder, each comprising four layers of 3Dconv-bn-RLU, were interconnected.After the final decoder layer, a 3D conv-bn-RLU layer was integrated for the ultimate lesion segmentation prediction.The model employed the Adam optimizer with an initial learning rate of 0.001, gradually reduced by a factor of 0.1 every 30 epochs, culminating after 60 training epochs.
Validation set was utilized to fine-tune the hyperparameters, such as adjusting the learning rate and batch size.This iterative process enabled the identification of optimal hyperparameters that led to best results on a new data set, which played a crucial role in enhancing the model's generalizability.
The accuracy of the automated liver and lesion segmentation models was evaluated on the test set.The mean Dice similarity coefficient (DSC) between the automated and manual liver segmentations was 0.95±0.11,with a range of 0.79 to 0.99 across all sequences.In addition, the mean DSC between the automated and manual lesion segmentations was 0.78±0.16,with a range of 0.59 to 0.96 across all sequences.

Image Acquisition, Preprocessing and Automated Segmentation
De-identified magnetic resonance (MR) images were uploaded to a commercial visualization and analysis software (LiverMRDoc; version 2.10.0;Shukun Technology Co., Ltd).
Before automated segmentation, one radiologist (HW) inspected all MR images in terms of the sequence names, HCC lesions, and corresponding 3D bounding boxes (i.e., the automated lesion detection annotation) on the AI software platform.To ensure accurate localization of tumors, manual adjustment was conducted for 16 patients with inaccurate 3D bounding boxes (e.g., failing to detect HCC lesions or delineate the whole tumors).
Using 3D U-net-based DL algorithms as detailed in Supplementary Material 2 and Before extracting radiomic features, voxels in each MR image volume were resampled to an isotropic voxel size of 1.0 × 1.0 × 1.0 mm 3 .This standardization helped minimize the impact of various MR imaging conditions, like pixel spacings and slice thicknesses.
For gray value discretization, a bin width of 25 was applied to volume images.A normalization of image intensity values was performed to enhance comparability and interpretability of the radiomic features.The normalization scale parameter was set to 1 to retain the original scale of normalized images and ensure the data integrity.To address geometric differences in liver MRI images, a geometry tolerance parameter of 1e-5 was employed during extracting radiomic features.
Extracted feature classes included the shape features, first-order features, secondorder features, and higher-order features.The shape-based features described the area, volume, perimeter, contours irregularity and compactness (perimeter²/area) of tumor or liver.The first-order statistics depicted the distribution of individual voxelvalues within the MR image without emphasizing their spatial relationships.The second-order features (i.e., Gray Level Co-occurrence Matrix [GLCM] features) conveyed additional information about texture by considering relationships between intensities of neighboring voxel pairs.The higher-order features (e.g., Gray Level Run

Matrix [GLSZM] features, and Gray Level Distance Zone Matrix [GLDZM] features)
provided sophisticated patterns and textural information, highlighting the relationships among multiple voxels [1].

Radiomic Feature Normalization Abnormal Feature Exclusion
Radiomic feature normalization, abnormal feature exclusion, feature selection, and radiomic signature construction were performed with R software (version 4.

Feature Selection
After normalization and excluding abnormal features, we followed a four-step procedure to reduce dimensions and select robust radiomic features on the training set.
First, intervariable collinearity was estimated by Spearman correlation analysis.For radiomic features with a Spearman's rank correlation coefficient >0.8, hierarchical feature clustering was performed to remove redundancy.These features were clustered into 1 to N (defined as the number of features divided by 3) classes, respectively.To determine the optimal number of clusters, the mean Silhouette Coefficient (mSC) was used whereby a higher value denoted a better quality of clustering.The resulting cluster was represented by one feature with largest range of Insights Imaging (2024) Wei H, Zheng T, Zhang X, et al.

values among the clustered features. As such, representative features in all clusters
plus features with a Spearman's rank correlation coefficient ≤0.8 were entered into further analyses.The cluster configurations and representative features generated from the training set were applied to the test set, because all parameters of the radiomic signatures must be remembered with the model building and applied to the test set with the same threshold.
Subsequently, univariable Cox regression analysis was performed to identify significant radiomic features associated with early recurrence.Features with P<0.01 were kept for further analyses.
Next, random survival forest (RSF) was applied to select the top 20 features.The measure of variable importance (VIMP) was used to rank the importance of variables [2], with a higher value indicating a greater importance.
Finally, based on top 20 features derived from the RSF, radiomic signatures were constructed by the multivariable Cox regression analysis using backward elimination approach with five-fold cross-validation.

Radiomic Signature Development and Validation
Eight groups of radiomic signatures were built for predicting early recurrence based on different combinations of radiomic features extracted from tumor, tumor border extensions (5mm, 10mm, and 20mm), and liver parenchyma, including:

Figure S1 Figure S2
Figure S1 3D-Unet Architectures of Liver and Tumor Segmentation Models (Page 18) two branches for liver boundary segmentation and pixel-level liver region segmentation, respectively.Following precise liver segmentation and anatomical delineation, rigorous image registration was implemented by aligning multiple MRI sequences with a standardized spatial reference framework, thereby enhancing spatial coherence between the liver segmentation masks across diverse MRI sequences.To improve FLL detection accuracy, segmented liver images were transformed into input data for a lesion detection algorithm.Subsequently, an advanced deep learning algorithm was developed for automated FLL detection in each sequence of contrastenhanced MR images.The core algorithm is the use of a 3D-CNN model known as the Unified Multi-Sequence Lesion Detector (MSLD), which comprised two primary elements: (a) a series of Single Lesion Detectors (SLD) for independent lesion detection in each sequence, and (b) a False Positive Reduction (FPR) module to mitigate false alarms in identified lesions.Utilizing the MSLD model, each detected lesion was annotated by a bounding box in each sequence.To address MRI sequence diversity, we devised Single Lesion Detectors (SLDs) tailored for each sequence, extending the Mask region-based convolutional neural network (R-CNN) [2] framework to process 3D input images.Four SLDs, sharing the same architecture, effectively accommodated variations in tissue appearances across various sequence groups, including pre-contrast T1WI, post-contrast T1WI, T2WI, and DWI.The SLD framework incorporated the Region Proposal Network (RPN), ROI alignment, lesion identification, and segmentation modules.The introduction of an adaptive receptive field enabled global feature extraction within slices, and the Feature Pyramid Network (FPN) [3] captured multi-scale information for robust perceptual capabilities.Training involved the normalization of preprocessed images (2×2×2 mm³ spacing), cropping into 160×160×160 patches, and employing the Adam optimizer for 200 epochs with a batch size of eight.The initial learning rate was 0.001, decaying every 50 epochs.Multiple sequences were utilized to minimize the impact of image artifacts on lesion detection.In automated FLL detection within the SLD section, bounding boxes for various sequences were cross-referenced to identify candidate lesions.To reduce Insights Imaging (2024) Wei H, Zheng T, Zhang X, et al. false alarms from artifacts, a dedicated FPR module integrated a 3D-CNN for feature extraction from each ROI, followed by feature integration from multiple sequences for binary predictions.Standardizing ROI dimensions to 32×32×32 ensured uniformity for typical lesion sizes.Model training spanned 200 epochs with an initial learning rate of 0.001, decayed by 0.1 every 30 epochs, and a batch size of 64 for optimized training.

Fig. S1 , 2 .
Fig. S1, automated segmentation of liver and HCC lesions was conducted on each transverse section of T2-weighted imaging (T2WI), in phase (IP), opposed phase (OP), arterial phase (AP), portal venous phase (PVP), and delayed phase (DP; for MRI with extracellular contrast agent [ECA]) or translational phase (TP; for MRI with hepatobiliary contrast agent [HCA]) images.To implement the quality control, one radiologist (HW) visually inspected each segmented tumor and liver, and those (n=40) with inaccurate tumor or liver segmentations on any above sequences were excluded from radiomic analyses.The exclusion criteria for inaccurate segmentation were (a) tumor region of interest (ROI) covered nontumoral areas (e.g., liver parenchyma, benign cysts, adjacent organs or tissues) (n=18); (b) tumor ROI failed to cover the whole tumor areas (n=8); (c) liver ROI failed to cover the whole tumor or liver areas (n=6); and (d) liver ROI covered areas beyond the liver (n=8).Examples of inaccurate image segmentations are presented in Fig. 2. Manual adjustment was not considered because the study aimed to examine the prognostic utility of this automated technique.To assess the accuracy of automated DL segmentation, one radiologist (TYZ) who was unknown to the automated segmentation results manually segmented 30 randomly chosen HCC lesions using ITK-SNAP (version 3.8.0;www.itksnap.org).
3.1; The R Foundation for Statistical Computing).Values of extracted radiomic features on the training set were normalized with z scores; the means and standard deviations derived from the training set were applied to the feature normalization of the test set.Abnormal features with a variance of 0 were excluded from further analyses.Variance measured the degree of dispersion of values; a variance of 0 suggested that feature values were same in all patients.The number of abnormal features ranged from 1552 to 3142 for all radiomic signatures.
(a) HCC, (b) HCC with 5 mm tumor border extension, (c) HCC with 10 mm tumor border extension, (d) HCC with 20 mm tumor border extension, (e) HCC and liver, (f) HCC with 5 mm tumor border extension and liver, (g) HCC with 10 mm tumor border extension and liver, and (h) HCC with 20 mm tumor border extension and liver.The optimal radiomicsignature that exhibited highest performance was selected for building the hybrid model.

Figure S1
Figure S1 3D-Unet Architectures of Liver and Tumor Segmentation Models.RLU = rectified linear unit, ROI = region of interest.

Figure S2
Figure S2 Representative Images of A Patient with HCC at High Risk of Early Recurrence Determined by the Hybrid Model.Pathologically confirmed moderately differentiated HCC in a 50-year-old man without microvascular invasion.MRI with extracellular contrast agent demonstrated a 5.2 cm HCC in liver segment VII.The tumor (*) showed mild to moderate hyperintensity on (A) T2-weighted image, diffusion restriction on (B) diffusion-weighted image (b = 1500 s/mm 2 ), hypointensity on (C) in phase image, rim APHE on (D) late arterial phase image, non-smooth tumor margin on (E) portal venous phase image, and incompletetumor "capsule" (arrowhead) on (F) delayed phase image.This patient had two risk factors (rim APHE and incomplete tumor "capsule") for early recurrence, with the radiomic score of -0.11 points.The final calculated score according to the hybrid model was 1.67 points, corresponding to the high-risk group (≥1.25 points).Early recurrence occurred after a follow-up period of 8.7 months.APHE, arterial phase hyperenhancement; HCC, hepatocellular carcinoma; MRI, magnetic resonance imaging.

Table S3
Number of Radiomic Features in Each Step of Feature Selection on the Training Set (Page 16)

Table S4
Eight Groups of Radiomic Signatures for Prediction of Early Recurrence according to Multivariable Cox Regression Analyses (Page 17)

Table S1 MRI
Sequences and Parameters

Top 20 Features Entered into Multivariable Cox Regression Analysis Radiomic Signature
Insights Imaging (2024) Wei H, Zheng T, Zhang X, et al.